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Abstract — Motivated by the concept of probe storage, we study 
the problem of information retrieval using a large array of N 
nano-mechanical probes, N ~ 4000. At the nanometer scale it 
is impossible to avoid errors in the positioning of the array, 
thus all signals retrieved by the probes of the array at a given 
sampling moment are affected by the same amount of random 
position jitter. Therefore a massively parallel probe storage device 
is an example of a noisy communication channel with long 
range correlations between channel outputs due to the global 
positioning errors. 

We find that these correlations have a profound effect on the 
channel's properties. For example, it turns out that the channel's 
information capacity does approach 1 bit per probe in the limit 
of high signal-to-noise ratio, but the rate of the approach is only 
polynomial in the channel noise strength. Moreover, any error 
correction code with block size N » 1 such that codewords 
correspond to the instantaneous outputs of the all probes in the 
array exhibits an error floor independently of the code rate. We 
illustrate this phenomenon explicitly using Reed-Solomon codes 
the performance of which is easy to simulate numerically. 

We also discuss capacity-achieving error correction codes for 
the global jitter channel and their complexity. 

Index Terms — Probe storage, maximum likelihood detection, 
Shannon capacity, Gallager random coding bound, error expo- 
nent, Fano inequality 



I. Introduction 

THE invention of atomic force microscopy in 1986 fT| 
opened the possibility of storing information at nanome- 
ter scales resulting in proposals for achieving aerial infor- 
mation densities of tens or even hundreds of Terabits per 
square inch. The basic idea is that information can be stored 
by altering certain features of the storage medium at the 
scale of nanometers. These changes can be then sensed and 
information retrieved by nanoscale probes similar to those used 
in atomic force microscopy. For example, binary information 
can be stored in crystalline dots created in amorphous media 
or amorphous dots in crystalline media and retrieved using 
electric probes (phase change storage, |2])- Alternatively, the 
information can be stored using indentations or, more recently, 
variable length grooves made in polymer media and retrieved 
using thermoelectric probes (thermo-mechanical probe stor- 
age, 0). 
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Cantilever array on CMOS chip 




Storage medium on 
MEMS scanner 

Fig. 1. The layout of the 'Millipede' 
device. Courtesy of IBM - Zurich, |7| 



a thermomechanical probe storage 



The whole concept of storing and retrieving information 
using nano-scale probes became known as probe storage, see 
[4 1, Chapter 4, for a comprehensive review of the current state 
of the field. Perhaps the most widely known concept of the 
probe storage device is the IBM's 'Millipede' (8), - an 
array of thermo-electric probes with sharp tips used to create 
and sense indentations in the polymer media. The layout of 
the Millipede is shown in Fig.[T] the basic principle of thermo- 
mechanical reading and writing is explained in Fig. |2] It has 
been demonstrated that data can be retrieved with low bit error 
rate per probe (about 10~ 4 ) at densities of up to 2 Terabit per 
square inch [6|. 

The particular feature of the Millipede shared by all existing 
concepts of probe storage is the presence of a large (~ 2 12 ) 
number of probes reading and writing the information in paral- 
lel. This feature makes probe storage very different from more 
traditional storage devices such as magnetic or optical disks 
or flash memory. Each probe reads and writes information 
in its own field. The array of probes moves as a whole to 
allow each probe to explore its field. For aerial densities of 
information of several terabit per square inch, the array has to 
be moved by a distance of the order of 10 nanometers from one 
set of the sampling points to the next and repositioned with 
a sub-nanometer precision for writing or reading. Inevitably, 
positioning errors affect every single probe in the array. We 
refer to the combined effect of errors in the positioning of 
the array at both the reading and writing stages as global 
positioning jitter or just global jitter. The aim of this paper is to 
investigate the performance of error correcting codes operating 
on the output of the probe array subject to highly correlated 
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Fig. 2. The principle of thermomechanical reading and writing. Top left 
picture, writing bit 'zero': cold probe pressed against the polymer surface 
leaves no mark in the media. Top right picture, writing bit 'one': the probe 
heated above the polymer's melting temperature and pressed against the 
polymer surface leaves an indentation in the media shown in the bottom left 
picture. Bottom right pictures, reading bits 'one' and 'zero': the probe inserted 
in the indentation is cooler than the probe pressed against the flat surface. 
These temperature variations can be captured using thermo- resistive sensors. 
Courtesy of IBM Zurich, (8] 



disturbances due to global jitter. We will investigate both the 
performance of special codes (Reed-Solomon) and address the 
general question of existence of good error correction codes 
for the global jitter channel by calculating channel capacity 
and studying Gallager's random coding bound. 

Despite the fairly abstract tools used in the paper, the 
purpose of our investigation is to answer a very practical 
question: given a 'good' communication channel (e. g. the 
single probe's channel with bit error rate 10~ 4 ), how good 
is a communication system (e.g. a Millipede) consisting of 
thousands of good channels subjected to correlated error 
events? 

Our system-level analysis allows one to understand advan- 
tages and limitations of complicated communications devices 
without actually building one. In particular, it turns out that 
information theoretical performance limits derived in this 
paper have crucial system-level implications for the design 
of error correction codes for probe storage. 

The rest of the paper is organised as follows. In Section UD 
we introduce a simple model for the probe storage channel, 
the global jitter channel, which accounts for the effects of 
the probe array's position jitter and calculate the probability 
distribution function of signal amplitude for Gaussian isolated 
pulse response and Gaussian statistics of jitter. In Section [Til] 
we introduce a low complexity signal detection scheme for the 
global jitter channel and prove its optimality in the limit of 



large array sizes. In Section [IV] we calculate block error rate 
for non-interleaved Reed-Solomon codes applied to the probe 
storage channel and analyze their error floor behaviour. In 
Section[V]we calculate Shannon's capacity of the global jitter 
channel and show that it approaches 1 bit according to a power 
law in the limit of large signal-to-noise ratio. In Section [VTI we 
calculate the average block error rate for non-interleaved codes 
sampled from Gallager's ensemble (the random coding bound 
or RCB) and show that it exhibits an error floor behaviour as 
a function of SNR. In Section \VU\ we show that there exists 
no non-interleaved codes with a positive rate which can be 
used for error- free retrieval of information using large arrays 
of probes in the presence of global jitter. In Section IVIIII we 
discuss minimal requirements for capacity-achieving codes for 
global jitter channels. We conclude our work with Section [IX] 
which contains the summary of the results of our investigation. 
For completeness, we present the derivations of more technical 
results obtained in the paper in the Appendix. 

II. Channel Model 

We consider the system consisting of an array of TV probes 
reading/writing in parallel. We assume that channel coding has 
been used (an RLL code for example) and that the symbol 
pitch is large enough so that inter-symbol interference can be 
ignored. The sampled readback signal at the k-th probe at the 
t-th moment in time is modelled as: 

r[ k) = p(J t )a[ k) + crn[ k) , k = l,2,...,N (1) 

where dj € {0, 1} is the bit written to the medium by the 
k-th probe at time t, Jt is the global positioning error (jitter) 
at time t, p(J) is the channel impulse response and {<mi } is 
a sequence of random variables modelling the combined effect 
of electronics and media noise. 

Experiments carried out at IBM Zurich [9 1 have confirmed 
that for thermo-mechanical storage media the combined elec- 
tronics/media noise is well modelled by Gaussian random 
variables. Thus we assume that {rij } are independent iden- 
tically distributed Gaussian variables with mean zero and unit 
variance. Parameter a is the standard deviation of the resulting 
additive white Gaussian noise (AWGN). 

It has also been demonstrated experimentally that when 
using a Millipede-like positioning control loop the random 
jitter J t is also well modeled by a mean-zero Gaussian random 
variable [9|. Note that this observation applies to any probe 
storage device, for instance a system based on phase change 
media, as long as a similar positioning system is used. Let <jj 
be the standard deviation of Gaussian jitter. 

Most of the qualitative results reported below do not depend 
on the detailed assumptions about the statistics of jitter or the 
precise shape of the impulse response. For quantitative analysis 
and numerical simulations of the global jitter channel we will 
use the Gaussian impulse response: 

p{J) = e~w* (2) 

where W is a parameter related to pulse width. In what follows 
it will be more convenient to work directly with the random 
variable p(J) € (0, 1) which measures signal amplitude 
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degradation due to position jitter J. A calculation leads to 
the following answer for the probability density function of p: 



PP (p) = Ej5( P -p(J)) = 



W 



2na 



Ml)" 



(3) 



Finally we define the signal-to-noise ratio (SNR) of the global 
jitter channel measured in decibels as follows: 



>A7? = i()lo Aui ( i 



III. Optimal Channel Detector 



(4) 



The global jitter channel is characterised by strong equal 
time correlations between outputs of all A-probes in the array. 
In the limit A — > oo it is possible to exploit these correlations 
and derive an asymptotically optimal low-complexity detection 
scheme. 

To motivate the rigorous argument given below, let us ask 
the following question: what is the optimal channel detector 
conditional on the knowledge of the value of jitter at time 
f? Conditional on the known amplitude value p t = pUt) = 

(k) 

p, channel outputs r t s are independent. Consequently, the 
optimal maximum a posteriori (MAP) detector ifTTI is simply 
the collection of A independent optimal threshold detectors 
for AWGN channel: 



if r. 
if r. 



(fc) 
t 

(k) 



(5) 



The problem with the above 'Genie-assisted' detector is 
that the value of the amplitude p is a priori unknown. 
However the value of p can be reliably estimated from the 



string r. 



(!) J 2 ) 



if A is sufficiently large. The key 



observation concerns the sample average of all A received 
signals at time t: 
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Assume that channel inputs a t are independent and uni- 
formly distributed. Then by the strong law of large numbers 
1 10 1, both sample averages in © converge almost surely to 
their expected values as A — > oo: 
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(7) 



Substituting into © we find in the large A limit it is actually 
possible to compute the previously unknown distortion p t — 
p(Jt) as follows: 



Pt 



N 



lim (2R t 

T . \ 



(8) 



Feeding this estimate of p to A independent threshold detec- 
tors (O we obtain estimates fij., &2, ■ ■ ■ , ajv of data bits which 
should become optimal in the limit A — >• oo. 



The heuristic argument given above leads to the following 
asymptotically optimal detection algorithm for the global jitter 
channel: 

1) Input: Channel output r^, rf^, 

2) Estimate the threshold: 



at time t. 



T N = 



1 N 

-Y 

A ^ 



k=l 



3) Perform bit-by-bit detection: 



if r\ 
if A 



>T N 
< T N 



4) Output: The estimate of channel input a[^a\ 2 ' 



(9) 



(10) 



We will now prove that detector (fTOb is indeed optimal in 
the limit A — > oo. Firstly, note that detector (|5) is an optimal 
MAP detector which infers the most probable channel input 
conditional on rt, pt- Therefore its bit error rate is smaller or 
equal than bit error of any detector which infers channel input 
conditional on channel input rt only. So we can establish the 
asymptotic optimality of ( fTOb by proving that its bit error rate 
approaches the bit error rate of (0 in the limit A — > oo: 



lim |Pr(a K V<z 

JV— >oo 



)-Pr(4 fe V«Dl = 
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A long but straightforward calculation presented in Appendix 
lAl shows that 

3(l 



pr$*Van-pr(arv«ni< 
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(fc) 



-■ (12) 



(2ttA) 1 /3 

Taking the large- A limit of both sides of the above inequality 
we arrive at (fTTT i. Thus the asymptotic optimality of (fTOb is 
established. We will refer to detector (TTOt as the LLN '-detector 
and detector © as Genie — detector. 

The performance of the LLN-detector for large but finite A 
is compared with the performance of Genie-detector in Fig. [3] 
for Gaussian pulse (O and various values of jitter strength <jj. 
The size of the probe array is A = 1000, the pulse width is 
W = 0.5 and the simulation has been run for 10 5 consecutive 
readings of the entire probe array. As a reference the perfor- 
mance of the channel with no jitter which corresponds to the 
classical binary-input additive white Gaussian noise (AWGN) 
channel has been included. It is observed that as jitter strength 
is increased the BER performance degrades significantly but 
for all cases the performance of the LLN-detector is virtually 
indistinguishable from Genie-assisted detector. It is interesting 
to note that the gap between BER of Genie-detector and the 
LLN-detector increases with SNR, see Fig. [3] This observation 
is consistent with the bound (fl2l i: The right hand side of the 
bound grows if the block size A is kept fixed but the additive 
noise strength a is reduced. The intuition behind the observed 
divergence is very simple: for smaller noise one needs a more 
precise threshold estimate to stay near the performance of 
the Genie-detector. The precision of the threshold estimate 
depends on the rate of convergence of the sample sum and 
scales as l/y/N in accordance with Central Limit Theorem 
(see iflOl for a review). 
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N=1000, 10 5 blocks simjlated,W=0.5 
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Fig. 3. BER performance comparison of LLN detector against the ideal 
detector with perfect knowledge of global jitter 



It is interesting to compare the LLN detector to the sim- 
plest detection scheme proposed for Millipede, see. [12]. The 
Millipede detector consists of TV independent MAP detectors 
(one detector per probe). The fc-th detector estimates the data 

(k) (k) 

bit a t at time t conditional on the output r t only. Unlike 
the LLN-detector, there is no sharing of information between 
individual detectors. 

The maximum-likelihood detector of the single probe output 
modeled by (l} is a simple threshold detector: the most likely 

(k) 

bit given the output of the fc-th probe, a t is given by: 

a {k) = ( 1 if r * fc) >r ° (13) 
\ otherwise 

where ro is the optimal threshold - a number between and 
1 which can be either measured experimentally or computed 
theoretically for a given the channel model by solving the 
maximum likelihood equation 

Pr(r t (fc) = r | af ] = 1) = Pr(r t (fc) = r | af ] = 0), 

see lfl3l for more details. 

In Figure |4] the BER of the Millipede detection scheme is 
compared with that of the LLN-detector for two values of 
jitter strength uj. The result is quite striking: the optimal 
detector for the global jitter channel outperforms the set 
of ./V independent threshold detector by over a decibel for 
BER- 1CT 4 . 

Finally, let us analyse the complexity of the LLN-detector. 
The complexity of adding N fixed precision numbers in (|9) 
scales as Nlog(N), the complexity of the detection step (TTOb 
is O(N) so the overall detection complexity is 0(Nlog(N)), 
the detection complexity per detected bit is 0(log(N)). 

Having constructed the optimal detection scheme we can 
investigate the performance of error correction codes for the 
global jitter channel starting with Reed-Solomon codes which 
featured in the original Millipede proposal. 
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Fig. 4. Comparison of BER performance of the LLN-detectors for the global 
jitter channel and the set of Af-independent threshold detectors 



IV. The performance of Reed- Solomon Codes for 

THE GLOBAL JITTER CHANNEL. 

The typical probe storage array size N is 64 x 64 = 4096, 
which is close to the sector size of the previous generation 
of hard disk drives Q The simplest error correction coding 
(ECC) scheme for probe storage follows the example of hard 
drives: for the latter data is encoded sector-by-sector, for the 
former ECC is applied independently to K-b\t strings of data 
\i , I2 , 13 . . . to be recorded on the media by the probe array 
at times t\, t^, . . .. We refer to the described application of 
error correction coding as non-interleaved meaning that chan- 
nel outputs corresponding to different moments of sampling 
time cannot belong to the same code block. 

The non-interleaved ECC block size is equal to the number 
of probes N, the ECC rate is R — K/N. During the reading 
stage, the single-time output , , . . . , cq' of the channel 
detector is fed into the ECC decoder resulting in an estimate 
It of K recorded bits at each sampling time t. 

We start our study of error correction for global jitter 
channel with classical Reed-Solomon (RS) codes, see JT4| for 
review. These are the (N S ,K S ) symbol block codes over the 
Galois field GF(2 n ). Here N s = N/n is the number of RS 
symbols per block, K s = K/n is the number of information 
symbols. Symbols are represented by ?i-bit binary strings. 
The maximal block size of the code is N max — 2" — 1 
symbols. An RS code with any block size N s < N max can 
be constructed by treating missing (N max — N s ) symbols as 
zeros (this operation is called shortening). A Reed-Solomon 
code with rate R = K 8 /N s can correct up to 



incorrectly detected symbols, which makes it a maximal 
distance separable (MDS) code. 

'January 2011 has been designated as the date of the final transition from 
512B sector size in HDD's to the 4KB sector size. 
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The fact that the error event of a RS code depends on 
the number of incorrectly detected symbols rather than bits, 
makes it a very good code for channels dominated by relatively 
short bursts of noise. It is therefore not very well suited for 
the global jitter channel. However we will see below that 
conclusions drawn from analyzing the performance of RS 
codes in the presence of global jitter can be applied to any 
non-interleaved block ECC. 

We start with presenting results of numerical simulations. 
Fig. [5] illustrates the performance of rate-0.8 RS codes with 
symbol size n = 10 bits. Three codes are considered: with 
block size N s = 1023 (the maximal block size) and two 
shortened RS codes - with N s = 511 and N s = 255. These 
block sizes correspond to hypothetical probe arrays with 2550, 
5110 and 10230 probes. To model global jitter we use the 
channel (fl]i with jitter strength Oj = 0.2, and pulse width 
W = 0.5. For each of the codes we measure the probability 
of sector error or sector error rate (SER) as a function of 
signal-to-noise ratio (SNR). SER is measured as the number 
of times the number of symbol errors in the received string 
exceeded RS threshold N s ^-^. The total number of strings is 
10 6 to ensure that the total error count used to estimate SER 
is at least 10 2 for the highest SNR point. 

The first striking feature of the curves presented in Fig. |5]is 
that there no discernible performance loss using the shortened 
codes. This is in stark contrast from the known behaviour of 
RS codes for the AWGN channel where 
logSER RS ( Nl 

logSER RS{N2R) N 2 ' 

Therefore, the numerical evidence suggests that the prob- 
ability of RS codes applied to the global jitter channel does 
not depend on the code's block size. 

The second unusual feature of SER vs. SNR curves shown 
in Fig. [5] is the exponential law of the decay of SER with 
SNR - all the curves look like straight lines on the semi- 
logarithmic plot. The exponential rather than the 'waterfall' 
shape of SER curves is often referred to as an 'error floor'. 
Its appearance is normally attributed to the use of suboptimal 
decoding algorithms such as belief propagation, rather than 
channel properties. Here we are driven to a conclusion that 
non-interleaved RS codes decoded with an optimal maximum 
likelihood hard input algorithm exhibit an error floor, the 
position of which is independent of the block size. 

As it turns out, the breakdown of Reed-Solomon codes for 
channels with global jitter can be understood analytically using 
the machinery of large deviations, [16|. 

To achieve this, we need to introduce some notations. Let 



(l-R)/2 



(14) 



be the maximal fraction of correctable symbols for the code 
at hand. Let the random variable take values {0, 1} with 
probabilities {eo,ei} respectively. Here e\ is the probability 
that the fc-th symbol is detected incorrectly. Due to statistical 
homogeneity of the channel model (HJ the probability of 
symbol error does not depend on the symbol index fc. The 
event = 1 corresponds to the fc-th symbol being detected 
erroneously and ^ = the fc-th symbol being detected 



correctly. Conditionally on the event p t — p the probability 
of RS sector error is simply given by a multinomial formula 
fl5l . Unfortunately, the multinomial formula is not very useful 
for quantitative analysis in the region of low SER. Instead, 
we are going to use simple asymptotic expressions for RS 
sector error rate based on Cramer's theory [16 . The event of 
sector error corresponds to the fraction of incorrectly detected 
symbols exceeding RS threshold ([Pil l. Hence, the probability 
of a block being decoded incorrectly can be written: 

Pr(SE \p t =p)=Pr 6c > tN, \ Pt = P^J (15) 

An application of Chemoff 's bound iflOl results in the follow- 
ing upper bound on Pr(SE) for any A > 0: 

Pr(SE \p t =p)< e - XrNs E [e A ( E "=i 5fc ) | p t = p\ , (16) 

where E[» \ pt — p] stands for p t -conditional expectation 
value. Recall that conditional on the value of signal amplitude 
Pt the global jitter channel is memoryless. Therefore symbol 
error events are conditionally independent and identically 
distributed and the bound in (fTST l can be re-written as: 



Pv(SE \p t =p)<e 



-\tN b +N 3 lnE e' 



(17) 



The bound in ( TTTb holds for any A > and thus we can choose 
the tightest bound possible by minimising over all A's: 



— lnPv(SE\p t =p)<I(T,p) 



(18) 



Function I(r,p) is known in the theory of large deviations as 
the rate function and is given by: 



I{r,p) = inf (—At 
a>o 



InE 



Pt = p] ) 



(19) 



The question remains: is the bound in ( fT8l tight enough to be 
useful? The answer is provided by an application of Cramer's 
Theorem [16| which states that provided E [£ | p t — p] < r 
(which is certainly true in the limit of low sector error-rates) 
the bound ( fT8l is tight in the limit of large block size: 

lim In Pr(SE | p t = p) = I(r, p) (20) 



An explicit expression for the rate function JT9l can be found 
by solving the critical point equation: 



A(_ A r + lnE[e A « \p t =p\) =0 



(21) 



The function differentiated in (fJTJ is convex with the unique 
point of global minimum given by: 



A, 



In 



eo(p) 



ei(p) (1-r) 



(22) 



where e±{p) = E(£ | pt = p) is the conditional symbol error 
rate, cq(p) — 1 — e\{p). If the condition of Cramer's theorem 
^ [6 I Pt — p] — e i{p) < t is satisfied then A c > and 
substituting back into ( fT9l we find the rate function can be 
expressed as a Kullback-Leibler divergence ifTTl : 



I(t,p) = -D kl ((1 



-)||(l-ei(p),ei(p))) (23) 
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Recall that for any two stochastic vectors P and Q, 



n=10, R=0.8, a ,=0.2, 10° sectors simulated 



D KL (P\\Q)=J2 p klog 



(24) 



On the other hand if e\{p) > r then A c < and due to 
convexity the minimum of ( fT9] l is achieved at A = resulting 
in the trivial bound I(r,p) = 0. 

Let us summarise our findings so far: 



Pr(SE \p t =p)<e 



-N b I(t, p ) 



(25) 



where 

I(r,p) 



Dkl ((1 - t,t) II (e (p),ei(p))) if e x (p) < r 
if e 1 (p) > t 

(26) 

Now we can derive an upper-bound for the performance of 
non-interleaved RS codes in the global jitter channel using the 
upper bound d25l [26b on the conditional SER: 

Pr ( SE ) = f dn P (p) Pr(SE\ Pt =p), (27) 
Jo 

where dfip(p) = pp(p)dp is the probability measure of jitter- 
dependent signal amplitude with density (0. The range of 
integration can now be split around the critical value of signal 
degradation p c which is the unique solution of the equation 



ei(p c ) 



Then using the upper bound 
ity of RS error we find: 



for the conditional probabil- 



Pr (SE)< d f i P (p)+ dfip(p) e - N ' Fr W (28) 

JO J p a 

Where F T (p) is expressed in terms of Kullback-Leibler diver- 
gence: 

F T ( P ) = D Wj ((l-r ) r)||(eo(p) J e 1 (p))) 

Note that F T (p) has a unique non-degenerate minimum at 
p c where it takes the value F(p c ) = 0. The proof of this 
fact follows easily from Gibbs inequality [17|. Therefore, we 
can apply the Laplace formula |18| to the second integral 
in equation (|28T > to derive the large-iV s asymptotic of the 
probability of sector error: 

Note that the resulting expression (|29T > tends to zero as \J~-^~ 
in the limit N s — > oo. Therefore what we have discovered is 
that for the global jitter channel in the limit N s —> oo the 
probability of sector error is upper-bounded as follows: 



Vv{SE)<Vv{p t <p c )+pp{p c ) 



N s F"{p c ) 



(30) 



We therefore confirmed theoretically that for a fixed level of 
noise and in the limit of large sector size, the probability of 
sector error for information encoded with a Reed-Solomon 
code and transmitted over the global jitter channel does not 
depend on the code's block size. This conclusion is in perfect 



O N=1023 
l N=512 
x N=255 




24 25 26 
SNR (dB) 



Fig. 5. RS SER curves do not depend on the block size and exhibit an error 
floor behaviour. 



agreement with the results of numerical simulations shown in 
Fig. 13 

Using d30b we can also explain the shape of the SER 
vs SNR curves thus confirming the appearance of the error 
floor analytically. Moreover, we will be able to determine the 
position of the error floor as a function of code rate R = 1— 2r. 

Up to this point our considerations did not depend on the 
specific shape of the impulse response or on jitter statistics. 
From now on we will assume the Gaussian impulse response 
given by equation (ffjl and Gaussian position jitter. Then it 
follows from (130b that 

Pv(SE) < Pv(p t < Pc ) + 0{n; x / 2 ) 
2 f 00 - j2 



where J c — Wy ln^^ 1 ) is the critical value of jitter that 
causes signal degradation p c . Bounding the integral in equation 
( l3TT l with elementary functions we arrive at the following: 



(32) 



Pt(SE) < p^ 



If we further assume that a high-rate Reed-Solomon is used 
so that t 1 then it follows that the symbol error rate 
conditional on p c , ei(p c ) = r is also much less that 1. In 
this limit we can approximate symbol error rate as follows: 



ei(p c ) = 1 - (1 - f( Pc )) n « nf(p c ) 



(33) 



where f(p c ) is the bit error rate conditional on p c which admits 
the following upper bound: 



/(Pc) 



1 



(34) 



Note that this upper bound is tight in the limit of large signal 
to noise ratios - the region relevant for studying the error floor. 
Therefore we can estimate the critical point p c by p c > p c as 
follows: 



Pc 



■ In 



2r 



(35) 
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n=8, N=255, a = 0.2, 10° sectors simulated 




R=0.9 [SER] 
R=0.9 [Laplace] 
R=0.7 [SER] 
R=0.7 [Laplace] 
R=0.5 [SER] 
R=0.5 [Laplace] 



20 



24 
SNR (dB) 



Fig. 6. The rate of exponential decay of Reed-Solomon SER with SNR does 
not depend on the code rate. 



Substituting ( f35T > into equation d32l > we arrive as the following 
bound on sector error rate for the global jitter channel with 
the exponential impulse response for high-rate Reed-Solomon 
codes: 



Pr(SE) 



< 



H£]) 



OiN- 1 / 2 ). 

(36) 



This is a disastrous result: the probability of sector error decays 
as power law in a with an exponent that does not depend on 
code rate or block size! The algebraic dependence of SER 
on the noise strength implies the exponential dependence of 
the probability of sector error on signal-to-noise ratio which 
explains straight lines on the semi-logarithmic SER-SNR plot 
in Fig. 

Moreover, we observe that the position of the error floor 
(determined by the pre-factor in (l36l ) depends very weakly 
(logarithmically) on the code rate R = 1 — It. 

In Figure [6] we compare numerical simulations of RS codes 
of various rates with expression (|3H which is valid beyond 
the high rate approximation used to derive (l36l . We find a 
good agreement with our theoretical prediction: the rate of 
the exponential decay does not depend on either the block 
size or the code rate, the position of the error floor changes 
slowly with the code rate. 

We conclude that non-interleaved Reed-Solomon codes ap- 
plied to channel ([TJ exhibit an error floor: the probability 
of sector error decays exponentially as a function of SNR. 
Moreover, the rate of the exponential decay does not depend on 
either the code rate or the block size. The position of the error 
floor varies as a logarithm of the code rate. Non-interleaved 
Reed-Solomon codes are thus not suitable for use in a probe 
storage system that suffers from global positioning errors. 

The above discussion suggests that the performance of 
Reed-Solomon codes can be improved with interleaving: by 
spreading the codewords over multiple time samples, the effect 
of occasional strong jitter leading to p < p c (r) can be 



mitigated. We will discuss the complexity of this solution in 
Section rVTTTI 

But first, we will address the following foundational ques- 
tion: do good codes for a global jitter channel exist in 
principle? 

V. Shannon Capacity of the Global Jitter 
Channel. 

Shannon's capacity C is one of the most important measures 
of quality for any communication channel. According to 
Claude Shannon's 1948 Channel Coding Theorem |fl9l , there 
exists no error correction code of rate R which would achieve 
an arbitrarily small probability of error for a channel with 
capacity smaller than code rate, C < R. The 'positive' part 
of Shannon's theorem states that for any R < C and for any 
e > then there exists a block code C with block size M (e) 
and rate less than or equal to R and a decoding algorithm 
such that the maximal probability of block error is less than 
e. Thus by computing Shannon's capacity for the global jitter 
channel we will establish an upper bound on the rate of good 
error correction codes for this channel. 

The definition of capacity rests on the notion of mutual 
information, see ifPTl for a review of fundamental notions of 
information theory. The mutual information between a discrete 
channel input a taking values in Vt a and a continuous output 
ensemble r taking values in Q, r is defined as follows: 



I(R-A) 



where 



agn a 



, , , dPr R i,i(r|a) 

dPrralog 2 * ] ,\\ (37) 
r\a dPrR(r) 



dPr fl | A (r|a) 



dPr K (r) 

is Radon-Nykodim derivative of the conditional probability 
measure Pr^^ with respect to the marginal probability mea- 
sure Prfl. If the probability densities Pr\a{ v | a) and pr(y) 
of the probability measures Pr/j^ and Pr# exist, the Radon- 
Nykodim derivative is simply the ratio of densities: 

^Pr fl | A (r|a) _ p R]A (v\&) 
dPr fl (r) ~ PR {r) 

Recall that for the global jitter channel, a is an TV-bit data 
string and r is a string of N real valued signals generated by 
the probe array at a given sample time. 

Shannon capacity is defined as the maximal mutual infor- 
mation over all input probability distributions per bit of input: 

max— I(R;A) (38) 

Pa N 



c 



As it is easy to see from the definition, < C < 1 (bit). In 
particular, C = 1 for noiseless channels. No information can 
be communicated over the channel with C — in finite time. 

For a complicated channel where the maximisation over 
Pr^ is difficult to perform it is common to study a weaker 
form of capacity: 



a 



i.u.d. 



l 

N 



I(A;R) 



(39) 



Pr , 
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where Prj. u .d. is the probability distribution for which channel 
inputs are chosen independently and with equal probability. It 
is clear that 

Ci.u.d. 5: C. 

Therefore if we find that Ci. u .<i. is close to 1, then C is also 
close to one and we can be certain that there is a high rate 
ECC scheme which will achieve low probability of error for 
the global jitter channel. From this point onwards we will be 
only concerned with Cj.„.,j. and assume that Pr^(a) = j W , i. 
e. that all 2 N inputs sequences a = (ax, . . . , a^) are sampled 
independently according to a uniform distribution. 

The calculation of channel capacity is a notoriously difficult 
problem. Analytically it can be evaluated for the very sim- 
plest channels only, such as binary symmetric, binary erasure 
or AWGN channels ifTTl . Chapter II. There exist efficient 
numerical algorithms for calculating capacity for channels 
with rapidly decaying correlations such as ISI channels, 11201 . 
Unfortunately, the numerical evaluation of capacity is not an 
option for channels with long range correlations such as the 
global jitter channel: due to the strong correlations between 
all signals received at the same time, the calculation of the 
A^-dimensional integral in the right hand side of (l37l > cannot 
be reduced to a set of low dimensional problems which makes 
the numerical evaluation of capacity extremely inefficient. 

Fortunately, it turns out that capacity of channel (fl~|i can 
be calculated asymptotically in the limit of large array size 
N » 1. The simplification which allows capacity calculation 
in the large-iV limit is easy to understand: on the one hand, 
the received signals r are independent conditionally on the 
value of the amplitude p — p(J). Conditionally on p, channel 
capacity is given by the well known expression for binary 
AWGN channels. On the other hand, the value of p{J) can 
be extracted from the string of N » 1 received signals with 
relative accuracy of the order of 1/ y/~N due to the law of large 
numbers. 

This simple argument suggests the following answer for 
capacity of the global jitter channel: 



lim — Ci 



N- 



N 



E P \C A 



WGN 



(P)] 



(40) 



where Cawgn(p) is the capacity of the AWGN channel with 
fixed signal amplitude p given by: 

CawgnKP) = 1 — I dx 



1 

/2tt J- 



exp 



x 
~2 



x log 2 1 + exp [fx 



r- 



(41) 



where / = p/a, and a is the standard deviation of additive 
white noise, see 02), Chapter II, for more details. In what 
follows we will also use the asymptotic expansion of Cawgn 
valid in the limit of weak noise, / >> 1: 



1 - Cawgn (p) 



2tt 



exp(-^) (1 + 0(r 2 )). (42) 



/ln(2) 

As it turns out, the derivation of (|40t is very simple 
and relies on some basic properties of mutual information 
which are both fundamental and intuitively obvious. Firstly, 



we notice that the mutual information for AWGN channel 
is maximized for the uniform distribution of inputs. Then a 
simple rearrangement of terms in ((37} gives the following: 

I(R; A) - NE p (Cawgn(p)) = I(P; R) - I{P\ (A, R)), (43) 

where J(P; R) is the mutual information between signal am- 
plitude degraded by jitter and the received signal, I(P; (A, R)) 
is the mutual information between signal amplitude and the 
joint ensemble of channel input and output. Clearly, 



I(P;{A,R))>I(P;R). 



(44) 



(The information we learn about P from observing A and R 
must be greater or equal to the information about P contained 
in R.) In case the above argument fails to convince a rigorous- 
minded reader, here is the proof based on Jensen inequality 

ma: 



I(P;R)-I(P;(A,R))=E 



(P,R,A) 



log- 



Jensen 

< logE ( p !i?!j4) 



dPr(PM)\A dPr R 
dP?(p,R) dPr R \ A 



dPr 



(P,R)\A 



dPiR 



= log / d Pr(P, R) = log 1 = 0. 

i[ipX!! r 

Using d44i i. relation (|43l leads to the following inequality: 

0<E p {Cawgn(p))- lim C Lu . d < lim —I(P; (A, R)). 

N—¥oc N-^-oo iv 

Therefore, to verify (|40t it remains to show that 

(45) 



1 



lim -I(P;(AP))=0. 

N—>oo iV 

Intuitively, the validity of the above claim is fairly obvious: 
imagine for example that random variable P is represented 
by m-bit numbers. Then the mutual information J(P; (A, R)) 
cannot exceed m bits and the limit in the right hand side (|45| > 
is trivially zero. For the proof of (1451 in full generality, the 
reader is referred to Appendix |B] 

The problem of computing capacity of the global jitter 
channel for large values of N is solved in principle: the 
right hand side of ( |40T > is a finite-dimensional integral which 
depends on the probability distribution of p(J). In particular, it 
is well suited for numerical study. In Figure [7] the capacity of 
the probe storage channel suffering a global Gaussian jitter is 
shown for various values of jitter strength <jj. For aj = the 
channel is equivalent to the binary AWGN channel with signal 
amplitude equal to one. As seen from the plot, this channel has 
the highest capacity. For a fixed SNR, the capacity of global 
jitter decreases as aj increases. For example at the SNR point 
corresponding to Cawgn — 0.9 bits, the capacity of global 
jitter channel with <j,j = 0.3 is C,.„.rf = 0.62 bits, which rules 
out the use of high rate linear error correction codes for this 
channel. The good news is that capacity seems to approach 
1 bit per channel symbol in the limit of high SNR even in 
the presence of global jitter. According to Shannon's theorem 
this means that by reducing channel noise one can read and 
write information reliably using large parallel probe arrays at 
a small redundancy cost. 
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Gaussian jitter, exponential impulse resonse (w=0.5) 
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Fig. 7. Capacity (i.u.d) for the global jitter channel 



Gaussian jitter, exponential impulse resonse (w=0.5) 
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Fig. 8. Comparison of capacity (i.u.d) for the global jitter channel against 
the reference channel 



Moreover, the global jitter channel can actually have a larger 
capacity than a collection of independent probes subject to 
individual jitter distortions of the same strength: In Figure [8] 
the capacity of the global jitter channel is compared against the 
reference channel where each probes suffers an independent 
Gaussian jitter distortion of the same strength for which the 
i.u.d capacity can be easily computed numerically. We observe 
that at low SNR the reference channel has the same capacity 
than the global jitter channel but at high SNR the global 
jitter channel actually has a larger capacity! We can explain 
this phenomenon by the fact that strong correlations between 
position jitters across the whole array can be used to extract 
extra information about the hidden parameters of the channel. 
This extra information can be used for example to build better 
signal detection algorithms, as discussed in Section Hill 

To complete our investigation of Ci U C i. we still need to: (i) 
confirm the numerical observation that capacity approaches 1 
bit in the limit of low channel noise; (ii) estimate the corre- 



sponding rate of convergence. We will solve these problems 
for the exponential impulse response assuming weak additive 
noise and weak jitter, i. e. 



a « 1, (Tj << W. 



(46) 



While both of these conditions are reasonable in the context 
of applications, the weakness of jitter leads to a significant 
technical simplification of the argument given below. Our 
starting point is the following bound on the capacity of the 
AWGN channel: 



1 



p 



> Cawgn(p) > I — Le so 



(47) 



for some positive constants U and L. Eq. (|47b results from 
the straightforward yet tedious analysis of ( BTV Averaging the 
above inequality over p we obtain the following bound for the 
capacity of the global jitter channel: 



lim C l . u .d. >1 — L' 



W 



7 r 



w 2 



N- 
w 2 

a 2 ^ + 



lim C z . u . d . <1-U 



N- 



,W a 2 " J / W 

<rj A /log(l/8<Py \2cr7 



(48) 



(49) 



where L' and U' are positive constants, T(x) is the T-function, 

I(z) = 8^ / dxx z - 2 e-' x \ 
Jo 

As the right hand side of the lower bound (|48l approaches 1 
bit in the low noise limit a = 0, we conclude that 



lim lim CV„. d . 

er->0 JV->oo 



= 1. 



which confirms the results of numerical simulations. The upper 
bound d49l shows that the convergence to one is not faster 

w 2 

than the power law a 2 "-J , which is much slower than the 
exponential convergence of Cawgn(p) ~ 1 — Ae~ p / 8<T 
to one - only a very 'clean' global jitter channel will have 
capacity close to the capacity of AWGN channel. 

Finally, comparing the lower capacity bound with the upper 
bound we see that with logarithmic precision the convergence 
of Ci. u .d. for large N is indeed given by the power law: 



l-Ci. 



Const ■ a 2 " 



(50) 



Therefore, the capacity approaches 1 bit in exactly the same 
way as the Reed-Solomon sector error rate vanishes in the limit 
of zero additive noise, see d36l >. The power law approach of the 
channel characteristics to their noiseless limiting values seems 
to be a feature of the global jitter channel. We will encounter 
the law d50l > again in the next Section, when we compute the 
Gallager's coding bound for the global jitter channel. 

VI. The random coding bound for the global 

JITTER CHANNEL. 

According to the results of the previous Section, the capacity 
of the global jitter is positive and even approaches 1 bit in 
the limit of low noise. Therefore, it follows from Shannon's 
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theorem that there are error correction codes with rate up to 
capacity C which can be used to transmit information over 
the channel with vanishingly small probability of error. Yet as 
we have seen in Section IIV1 non-interleaved Reed-Solomon 
codes exhibit an error floor for any block size and any rate. 

Therefore, we are led to the following question: are there 
any error correction block codes (linear or non-linear) which 
yield a small probability of block error if the code block 
coincides with the instantaneous output of the probe array? 

Let £ be a code with rate R and block size N (bits) which 
is also the number of probes in the array. Let Pr(SE | <£) be 
the probability of block decoding error for this code. 

This probability is difficult to compute for any non-trivial 
code. However, we can use the idea of Gallager [21 1 and 
calculate the average probability Pr(SE | R, N) of block 
error under the maximum likelihood decoding, where the 
average is taken over all random codes with a given rate R 
and block size N. Given Py(SE | R, N) there must exist a 
code Co with rate R and block size N such that 

Pt{SE I £ ) < Pl{SE | R,N). 

Therefore, if we find that Pr(SE \ R,N) is sufficiently low 
(e. g. 10~ 12 ) for a desired code rate R, we will know that there 
are codes which can be used to correct errors for information 
communicated over the global jitter channel. 

To perform the calculation of the random coding bound we 
need to define the space of random codes and a probability 
measure on this space. Following Gallager, we will construct 
a random binary code by picking K = NR binary codewords 
from the TV-dimensional binary space {0, 1}^ independently 
and uniformly (Gallager ensemble). 

Using the sum rule, Pr(SE | €.) can be re-written as 
follows: 

Pr(SE \£)= f dnp{p)Pr{SE\£,p) 
Jo 

where Pt(SE \ £,p) is the probability of block decoding error 
for the binary AWGN channel with a fixed signal amplitude 
p. Therefore, 

¥r(SE \R,N)= [ du P (p)Pl(SE \p,R,N), (51) 
Jo 

where Pr(S'e | p,R,N) is the random coding bound for 
the binary AWGN channel with a fixed signal amplitude p. 
The following set of results can be easily extracted from the 
original Gallager's paper lETI : 

Pl{SE | p, R, N) < e - NB ( R 'ri , (52) 

where E(R,p) is the error exponent for the binary AWGN 
channel. In what follows we do not need an explicit expression 
for the error exponent, but its basic properties listed below will 



be important to us: 

E(R,p) =0 for R>C AWGN (p), (53) 

E(R,p) > for R < C AWGN (p), (54) 
3E 

-q]^( r >p) \r=c AWG n(p)= °> (55) 
d 2 E 

-t^ {R,V)\ R=c AWGN ( P ) > 0. (56) 



Using the notion of the error exponent for AWGN channel, 
we can re-write the random coding bound for the global jitter 
channel as follows: 

Pr(SE \R,N)< [ dfi P (p)exp [-NE(R,p)} . (57) 
Jo 

Let p c be the unique solution to 

Cawgn(p) = R (58) 
Then (157b can be rewritten as: 

Pr (SE | R, N) < [ P dfi P {p)exp[-NE(R,p)} 
Jo 

+ [ d^ P {p)cxp[-NE{R,p)} (59) 

Jpa 

The crucial observation is that for p < p c , Cawgn(p) < R- 
As a result, for p < p c , E(R.p) = due to 031 . Therefore 
the average probability of decoding error for the global jitter 
channel is bounded: 

~Pl(SE\R,N)< r d f i P (p) 
Jo 

+ [ dnp(p)exp[-NE{R,p)] (60) 

Jpc 

The second integral in the right hand side of ( |60l can be 
evaluated for N >> 1 using Laplace method: due to d54l 
[55} the main contribution to the integral comes from a small 
neighbourhood of p c . It follows from (155156b that as a function 
of p, E(R,p) has a non-degenerate critical point at p = p c . A 
calculation based on the above points yields: 

Pi{SE \R,N)< Pv(p < p c ) 

+pp ^ ) ^i^^5^ + 0{N ' 1) ■ 

It is remarkable that in the limit N — > oo the probability of 
block decoding error is bounded by a function independent of 
block size: 

lim Pl{SE \N,R)< Pr(p < p c ) (61) 

N—*oo 

Therefore, the random coding bound for the global jitter 
channel does not vanish in the limit of the large block size! 

In Figure [9] the random coding bound (|6H is shown for 
various code rates for a global Gaussian positioning error 
with strength a,j — 0.2. The random coding bound for a 
channel where each probe in an array of N — 1000 suffers an 
independent Gaussian positioning error of the same strength 
is also given for the same code rates. It is observed that the 
probability of decoding error (SER) is vastly worse for the c ase 
of global positioning errors. For the independent channel the 
SER decay with a waterfall (super-exponential) shape whereas 
for the global jitter channel the SER decays exponentially with 
SNR, i. e. exhibits an error floor. 

The presence of the error floor in the random coding 
bound can be demonstrated analytically for Gaussian impulse 
response and Gaussian jitter in the limit of high SNR and high 
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Fig. 9. Random coding bound for the probe storage channel with independent 
and global jitter for various different rates 



Exponential pulse, Gaussian jitter, a =0.2, w=0.5, SNR = 15dB 
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Fig. 10. Behaviour of the random coding bound as a function of rate for 
the probe storage channel suffering a global Gaussian positioning error 



rate codes. The cumulative distribution function Pr(p < p c ) 
is given by: 

2 



Pi(p < p c 



< exp 



poo 


■ j 2 - 


I dJ exp 





2a 2 j 



(62) 



Where J c = W \J\n.{\ / p c ) is the positive value of jitter 
that results in signal loss p. Recall from (01 that W is 
parameter related to pulse width. Thus the average probability 
of decoding error is bounded by: 



lim PrtSE \R,N)< p 

AT— >oo 



(«§) 



(63) 



Furthermore in the limit of weak noise a — > and high 
rate R — > 1, it is possible to use the asymptotic expansion 
for AWGN capacity Cawgn(p) given by d42i > to derive an 
expression for p c to the logarithmic precision: 



Pc 



aV8m(l/(l-i?)) 



Thus for high-rate codes the probability of decoding error can 
be approximately bounded: 



lim Pi(SE \R,N)< C(aj,w,R) a 

AT— >oo 



(65) 



where C(aj,w,R) is the a-independent constant: 

C(aj,w,R) = (-8ln(l-R))\* p yJ (66) 

We conclude the average probability of decoding error in 
the limit of weak noise and high code rate is asymptotically 
independent of the code block size, exhibits an error floor with 
an exponent independent of the code rate R and an amplitude 
with depends on the rate via log(l—R). An identical behaviour 
has been observed for Reed-Solomon codes in Section [IV] 
Similar conclusions concerning the random coding bound can 
be reached for low code rates as well, see Fig. [10] where the 
bound d6"TT l is shown as a function of rate R for a fixed SNR. 
Note that the average probability of block error approaches 
zero only in the limit of zero code rate. 

The fact that the random coding bound exhibits an error 
floor behavior identical to that of RS code suggests to us 
that all non-interleaved error correction codes suffer identical 
performance degradation due to global jitter. This suggestion 
is confirmed in the following Section. 

VII. The non-existence of non-interleaved error 

CORRECTION CODES FOR THE GLOBAL JITTER CHANNEL. 

The existence of an iV-independent error floor in the average 
probability of block error has the following simple explana- 
tion: provided the strength of additive noise is positive and 
R > 0, there is an V-independent critical value of jitter p c (R) 
beyond which the conditional channel capacity Cawgn(p) 
is smaller than the code rate R. Therefore, according to the 
negative part of Shannon's theorem, information transmission 
without errors is impossible with V-independent probability 
Pv(p < p c (R)). 

The above consideration can be turned into a rigorous 
argument which shows that non-interleaved encoding of the 
large array's outputs cannot ensure error free information 
retrieval no matter what error correction code with positive 
rate is used: 

Let £(i?, N) be a block code with rate R and block size 
N. Clearly, 

Pr(SE | <£(R, N)) > f dfi(p) Pr(SE | p, £(R, N)). (67) 
Jo 

Recall that p c is the unique solution to R = Cawgn{p)- 
Notice that for every p in the region of integration, R > 
Cawgn(p)- By Fano's inequality [22], 

R - Cawgn(p) 1_ 

RN' 

Therefore, 

Cawgn (p) 



Pt(SE I p, £(R,N)) > 



Pr(SE | £(R,N)) > 



(64) 



1 

~RN 



R 

dfxip) (l - 
Pr(p < p c 



(68) 



R 



(69) 
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Therefore, the probability of block error is bounded below by 
a constant asymptotically independent of the block size. 
Integrating d69l l by parts it is easy to show that 



r n dc 

lim Pt{SE I £(R, N))> — Py{C A wgn{p) < c), (70) 

I^oc J Q R 



N 



where 



Pv(C A wgn{p) < c) 



dv(p)x(CAWGN{p) < c). 



Expression ( fTOl i proves that the probability of block error is 
bounded away from zero by an TV-independent constant for 
any non-interleaved error correction code with a positive rate 
R. 

Therefore, error- free information storage using large arrays 
of probes is impossible without interleaving error correction 
codes between multiple array outputs. 

It is worth noting that there is a counterpart of Fano's 
inequality for the global jitter channel - a non-trivial fact 
given strong correlations between all probe channels within the 
array. Using the fact that lim^oo C Lu .d. = E p (C A wgn{p)) 
which we established in Section[V]we can derive the following 
weaker bound from the bound (l70l : 

limjv^oo Ci. u .d. 



N 



lim Pt{SE I £(R,N)) > 1 



R 



(71) 



which shows the impossibility of error-free information trans- 
mission over the global jitter channel using non-interleaved 
codes with R > limjv->oo Ci.u.d.- 

Finally, let us show that any non-interleaved code used 
to encode information transmitted over Gaussian global jitter 
exhibits an error floor identical to Reed-Solomon error floor 
[36b or the error floor in the random coding bound (163] 
Note that the capacity of AWGN channel Cawgn (p) is 
a function of p/a only, 



C( P ) = F 

see (T4TT >. Substituting expression (01 for the probability mea- 
sure dfip(p) for Gaussian jitter into the integral in the right 
hand side of (|69i l and changing the integration variable p = ax 
we arrive at the following result: 



lim N ^Pv{SE | £{R,N)) 



> 



V27nrJ Jo 



dx 



log^ + log-±- 



F(x) 
R 



(72) 



where 7 = and x c is the unique positive solution to the 
equation 



F(x c ) = R. 



(73) 



Notice that x c a = p c < 1. The following inequality is valid 
provided the additive noise is weak: If 

log— > 1/2 : 

XrO 



and for any x : < x < x c 
1 



> 



log^ + log-±- Jlog-L- 



' X C (T 



(74) 



Using this inequality in d72l ) we conclude the following: 



Um N ^.ooPr(SE | £(R,N)) > I(R) 



(75) 



where 



fil{)= / -x I 1 



Fjx) 
R 



is cr-independent function of R and 7 = 

We conclude that any non-interleaved error correction code 
with rate R and a large block size used to transmit information 
over the global jitter channel exhibits an exponential error 
floor: log Pr(SE) is a linear function of signal-to-noise ratio 
with exponent 

W 2 

1=2%' (?6) 

The position of the error floor is determined by the function 
I{R), which is a slow function of code rate, but we do not 
study it here. Note that the upper bound on the error floor 
we derived using elementary tools only (Fano's inequality) 
coincides with the error floor observed directly for Reed- 
Solomon codes and the random coding bound for the global 
jitter channel. 

VIII. On good codes for the global jitter 

CHANNEL. 

In the two previous sections we established the impossibility 
of error- free information retrieval for large parallel arrays 
of probes subject to global jitter using non-interleaved error 
correction codes of any positive code rate. 

We also established in Section [VI that the capacity Ci. u .d. 
of global jitter channel is positive and approaches 1 bit as the 
additive noise strength a goes to 0. Therefore, by Shannon's 
theorem there must exist families of error correction codes 
with rates < R < Ci. u .d. which ensure vanishingly small 
probability of block error in the limit of large block sizes. 

Can we say anything about the structure of these codes? We 
understand that non-interleaved codes cannot perform well on 
the global jitter channel due to rare strong jitter fluctuations 
leading to small effective AWGN amplitude p < a. Due to 
this fluctuations N bits of information get lost regardless of 
error correction code used. 

To avoid this we must spread information over many time 
slices by encoding blocks of array outputs. Therefore good 
error correction codes for the global jitter channel must be 
interleaved. 

Let us estimate the depth of interleaving and the block size 
of the corresponding codes. Let r te z be the time series of array 
outputs. Recall that each output is an A^-dimensional vector, 
where N is the number of probes in the array. For a feedback- 
loop based positioning system used in Millipede, jitter random 
variables J t are correlated in time. Let L be the correlation 
length measured in the number of sampling periods. Our 
previous discussion on the influence of correlations on the 
performance of error correction codes can be summed up as 
follows: the probability of decoding error of the block code 
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of rate R < C'i. u .d. and block size B approaches zero as the 
number of independent groups of samples goes to infinity. 
In other words, if NL strongly correlated samples are treated 
as a single symbol, we get the usual statement of Shannon's 
theorem for memoryless channels: the probability of block 
error goes to zero as the number of symbols per block goes 
to infinity. 

The minimal block size of high rate error correction codes 
used in modern storage devices which achieve the probability 
of error of the order of 10~ 10 is of the order of 10 3 bits. 
Clearly, this is the lower bound on the interleaving depth. 
Therefore we can estimate the block size of the interleaved 
error correction code for probe storage as 

B ~ 10 3 iVi (77) 

Thus for a probe storage device with N ~ 10 3 probes in 
the array the error correction block size should be of the order 

B > 10 6 bits . 

The codes of similar sizes (5 • 10 5 bits) are already used in 
optical storage (Blu-Ray disks), however we thought it unusual 
to have discovered the need for large block sizes for a non- 
removable media storage device. 

IX. Conclusion 

In this paper we introduced and analyzed a simple model 
of the massively parallel probe storage channel suffering from 
global positioning errors. This model can be viewed as an 
example of communication channel consisting of N >> 1 
parallel sub-channels all of which are strongly correlated via 
a common distortion event. 

We solved the problem of optimal signal detection for the 
global jitter channel. Namely we found a detection algorithm 
of O(NlogN) complexity such that the estimated sequence 
converges bitwise to the maximum a posteriori estimate in 
the limit N — > oo. This is not an entirely trivial result, 
as in general, the complexity of optimal detection grows 
exponentially with memory length (for example, the optimal 
MAP detector for the channel with inter-symbol interference 
of length I and AWG noise has 2 1 states). For the channel 
at hand the detection algorithm simplifies due to statistical 
independence of the channel outputs conditional on the value 
of jitter. This value can be found by analyzing all channel 
outputs using the law of large numbers. 

We analyzed the performance of Reed-Solomon codes ap- 
plied to the global jitter channel without interleaving channel 
inputs corresponding to different moments of time. We dis- 
covered both numerically and theoretically that in the limit 
of large probe array, any non-interleaved Reed-Solomon code 
will exhibit an error floor the position of which is independent 
of the code's block size N and only weakly dependent on 
the code rate. This is a surprising result, as the phenomenon 
of the error floor is usually associated with sub-optimality of 
the decoding algorithm (e. g. belief propagation) rather than 
specifics of the channel. For the case of global jitter the origin 
of the error floor can be traced back to rare instances of strong 
jitter J such that the signal amplitude p(J) becomes of the 
same order as channel noise a. 



Motivated by these findings we addressed the following 
question: are there any good error correction codes for probe 
storage channels? The answer turned out to be two-fold: firstly, 
we calculated the capacity Ci. u .d. of the probe storage channel 
and discovered that it does approach one bit per channel 
symbol in the limit of weak channel noise a, albeit very 
slowly: 1 — Ci. u .d. approaches zero as cr 7 , where the exponent 
7 is equal to the error floor exponent for Reed-Solomon 
codes. However, we also found that Gallager's random coding 
bound for channel encoding which does not interleave between 
different time slices exhibits exactly the same error floor 
as Reed-Solomon codes! Moreover, an application of Fano 
inequality allowed us to prove that any non-interleaved error 
correction code applied to a massively parallel probe storage 
channel exhibits an error floor behaviour which is at least as 
bad as the Reed-Solomon error floor. 

With the benefit of hindsight, the appearance of the universal 
error floor in Reed-Solomon block error rate, Shannon capac- 
ity, Gallager's random coding bound and Fano's low bound 
on sector error rate for an arbitrary fixed code can be related 
to the statistics of large jitter fluctuations: if p(J) « a 
then regardless of the code used regions of confuse-ability 
(a hypersphere of radius a) for any two codewords intersect 
and all information and parity is lost! 

In this sense the global jitter channel for weak channel noise 
can be viewed as an effective block-wise erasure channel: 
either N bits are detected with no errors, or all N bits are 
lost. For such a channel, it is clearly impossible to improve 
performance by increasing the number of probes in the array if 
the encoding block size is kept equal to N. In order to reach 
Shannon's limit for the global jitter channel it is necessary 
to spread information between many outputs of the probe 
array thus mitigating the effects of occasional strong jitter. 
We estimate the necessary block size of good error correction 
codes for probe storage channel to be of the order of 10 6 - an 
unusually large number for a non-removable storage device. 

The mathematical results reported in the paper have severe 
practical implication for probe storage. Either the positioning 
system must be accurate enough so that global jitter is made 
very weak or we must interleave the error correction code in 
the time direction. The former option poses a huge engineering 
challenge: for indentation sizes of the order of 10 nm, our 
results suggest that the precision of the positioning system 
must be significantly better than 1 nm for the effects of global 
jitter to become insignificant. The latter option also presents 
significant implementation difficulties especially since jitter is 
known to be strongly correlated in the time direction and as a 
result we must interleave very deeply to overcome its effects. 

Due to the universal nature of information theoretic per- 
formance limits we discovered, we see no easy 'engineering' 
way around the problem of global jitter. For example one can 
attempt to re-read the information recovered by the array of 
probes following a strong jitter event. This is similar to the 
way off-track errors are dealt with in magnetic hard drives. 
Our results mean however that the throughput of such a system 
will be severely degraded due to a large number of necessary 
re-reads. 

From a more theoretical point of view, we developed a 
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general approach for solving the problem of performance 
evaluation of channels with long correlations: If the corre- 
lations are due to a small number of hidden 'correlation' 
parameters (e. g. global jitter), we can calculate the conditional 
capacity, the random coding bound, etc. using well known 
expressions for memoryless channels and then average over the 
hidden parameters to obtain the unconditional quantities. The 
universality of our answers can be explained by the fact that 
for a large number of sub-channels N >> 1 only the values 
of hidden parameters dictated by the law of large numbers are 
needed to remove the conditioning. 

Given the general nature of our analysis, we hope that re- 
sults reported in the present paper can be applied to other com- 
munications channels with long range correlations between 
channel outputs. Relevant examples include high density mag- 
netic storage for which long (of the order of several sectors) 
correlations are present due to off-track errors; optical storage 
with removable media for which correlations are generated by 
scratches on media surface; ultra dense NAND flash memory 
(corresponding to 20 nm and smaller transistor libraries) where 
long correlations along both bit- and word-lines occur due to 
read and programme disturbs. An example coming from digital 
communications is a multiple-input-multiple-output (MIMO) 
system of receivers and transmitters which is known to be 
strongly affected by the spatial correlations in the noise vector, 



We proceed to bound each term as follows upper-bound 



Pr r 



(k) 



Pr 



-T N 



>e,p = Tr\ < 1 (80) 



< e 



P = tt) < 1 (81) 



Pr 



-T N 



> e 



P = 7T I < 



Var(f-T JV |p = 7r) 



(82) 



-T N 



< e,p = it 



< 



2e e 



V2t, 



(83) 



Where d80]l and (ED are trivial, d82) is due to the central 
limit theorem (and convergence of Tm to p/2) and d83l is 
a simple upper bound of the corresponding Gaussian integral. 
Substituting the resulting upper bound into (l78~t and integrating 
over 7r we find: 



Pr 



< 



E(p 2 )/4 + g 2 



2c 



(84) 



We now minimise with respect to e to obtain the tightest bound 
possible. The critical value of e is given by: 



(E(p 2 )/4 + g 2 )V2^ 
N 



1/3 



(85) 



Appendix A 
The derivation of bound (fT2l 



Substituting e m j„ into d84l we arrive at the bound (fT2l . 



Let and a\ K1 be the k-th output of the genie detector 
(given by equation (fSJ) and the fc-th output of the LLN detector 
(given by equation ([Tol l) respectively. Then using the triangle 
inequality and the laws of conditional probability it is possible 
to show: 



Pr ( a[ k) ^ a 



,(fe) 



< J d Mp (vr)Pr(a| fc) + a k t \p = ir 



(78) 



The event corresponding to the output of the two detectors 
being different can be expressed as follows: 



Pr(a t W ^af ) b = 7r 



Pr r 



Pr r 



,(fc) 



x Pr 



x Pr 



P rp 

- 2 -T N 

V rp 

- 2 -T N 

P rp 
V rp 

2~ Tn 



> e,p = it 

p = IT 

< e,p = it 

p = IT 



> e 



< e 



(79) 



Appendix B 
The calculation of limjv^oc 7?I(P'i (A R))- 

We will prove that lirriAr^oo j^I(P; (A, R)) = not just for 
Gaussian jitter, but any distribution of p E (0, 1) with finite 
differential entropy: 



h = Ep I log 



< oo. 



(86) 



By definition, the mutual information between signal 
strength P and the channel's input and output (A, R) is: 



/(P;R) = Epilog 



dPr(A,R | P) 
dPi(A,R) ' 



(87) 



Using the independence of signal strength P and channel input 
A the above expression can be re-written as follows: 



/(P;R) = Epilog 



dPijR | A, P) 
dPr(R | A) ' 



(88) 



For Gaussian additive channel noise, the fraction under the 
sign of the logarithm takes the form 



F(P,A,R) 



_ dPv(R | A,P) 



dnp(q)e~ 



dPv(R | A) 

[2X(P-q)+Y(q 2 -p 2 )] 



(89) 
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where 



X 



Y 



k=l 
1 N 



(90) 



k=l 



Conditionally on P, random variables are X and Y are 
equal to sums of independent identically distributed random 
variables and 



E(X \P=p) 



P 
2' 

E(Y|P=p) = i 



(92) 
(93) 



Therefore, conditionally on P, X and Y converge strongly 
to their respective expectation values. This motivates the 
introduction of new random variables strongly converging to 
zero: 



X 



N 



X- P -,Y N 



Y 



1 



(94) 



Using Xn,Yn, the problem of calculating 
liiriAr_j. 00 jfI(P; (A, R)) can be formulated as follows: 
compute 



L = — 

where 

F(P,X,Y) 



lim ±-Ep, Xn , y Jo 9 F(P,X n ,Y n ), (95) 



dnp(q)e ^ 



Let us fix e > 0. A calculation which essentially repeats the 
derivation of Chebyshev inequality shows that 



Pr(X% + Y N > e \P=p)< 



1 



2e 2 N 



(96) 



Now let us re-write L using the partition of unity as the sum 
of two terms: 



L = lim (A N + B N ) 

N-yoa 



(97) 



where 

A N 



D 



N 



N 
1 

N 



E Pt x N , YN l x?f+Y i< e logF(P,X N ,Y N ), (98) 
Ep tXNiYN l xl+Y 2 >e logF(P,X N ,Y N ), (99) 



Under the condition that the differential entropy (the logarith- 
mic moment) of p( J) exists, it is easy to show that 



A N |< 7;0- 2 e. 



Similarly, 



B N |< 



4ct 2 



HX N ,Y N ,P) 



(2\X 



(100) 



\ Y N\)^X%+Y^>e 



< 



-^L.^/e {Xn , Yn , p) (2\X n \ + \Y N \Y^Pr(Xl 



where the last line is obtained using Schwarz inequality. Using 
and the fact that 

E(2\X N \ + \Y N \f = 1 Q + ^T2^ 



(91) we find that 



B N \ < 



Ne 



(101) 



The expectation value in the above expression always exists 
as the random variable P is bounded. Combining d 1 00| > and 
( 11011 ) we find that for any e > and any N, 



N 



for some e, N- independent constant C. Choosing e — N x l 2 
and taking the limit N —> oo, we find that 

L = lim (An + Bn) = 0. 

JV->oo 
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